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The lungs are the main organs in the respiratory system that have a function 
as a place for exchange of oxygen and carbon dioxide. Due to the importance 
of lung function, indications of lung disorders must be detected and 
diagnosed early. Research on the classification of lung conditions generally 
uses chest x-ray image data. Where a time-consuming procedure is needed to 


obtain the data. In this research, an embedded system to diagnose lung 





conditions was designed. The system was made to be easy to use 
Keywords: independently and provides real-time examination results. This system uses 
parameters of body temperature, oxygen saturation, fingernail color and lung 
volume in classifying lung conditions. There are three conditions that can be 
Embedded system classified by the system, that is healthy lungs, pneumonia and tuberculosis. 
K-nearest neighbor The k-nearest neighbor method was used in the classification process in the 
Pneumonia designed system. The dataset used was 51 data obtained from the hospital. 
Tuberculosis Each data already has a label in the form of lung condition based on the 

doctor’s diagnosis. The proposed system has an accuracy of 88.24% in 

classifying lung conditions. 
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1. INTRODUCTION 

The lungs are the vital organs in the human body and have an important role in the respiratory and 
circulatory system [1]. The main function of the lungs is to perform the process of exchanging oxygen and 
carbon dioxide in the bloodstream. This process occurs in the alveoli, which are the tiny air sacs in the lungs. 
The alveoli receive oxygen during the inspiration process, this oxygen moves to the bloodstream through 
capillaries. The oxygen-rich blood then flows throughout the body through the heart. Meanwhile, the carbon 
dioxide received by the alveoli from the bloodstream will be excreted from the body during the expiration 
process [2]. Disorders of the alveoli will cause instability in the circulatory system. The most common 
disease in the alveoli is pneumonia. Pneumonia occurs due to an infection caused by a virus that causes 
inflammation of the alveoli. As a result of this inflammation, the alveoli fill with fluid and pus. This makes 
the alveoli unable to provide sufficient oxygen to flow throughout the body [3]. 

Besides pneumonia, tuberculosis (TB) is also dangerous disease that attacks the lungs. Tuberculosis 
is caused by Mycobacterium Tuberculosis, which can lead to complications in the lungs [4]. Based on 
information provided by the World Health Organization (WHO), 10 million people worldwide were infected 
with tuberculosis with a mortality rate of 15% in 2018 [5]. The mortality rate of lung disease, both 
pneumonia and tuberculosis, can be reduced if the indication of the disease can be detected and diagnosed at 
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an early stage. Early diagnosis aims to provide fast and precise treatment to patients with a lung disease. In 
addition, early diagnosis also reduces the risk of transmitting viruses and bacteria to susceptible persons. 

Early diagnosis of lung conditions will be difficult to implement if conventional tests, such as 
sputum tests, spirometry tests, chest x-ray and CT scan are used [6]-[9]. The reason is because these tests 
must be carried out in adequate health care facilities and take a long time to get the test results. Therefore, a 
more practical and real-time system is needed to detect and diagnose lung conditions. This system needs to 
be used at any time, especially when early symptoms of lung disease are found. These symptoms include a 
persistent cough, fever, chest pain and shortness of breath. The diagnostic results obtained from this system 
can be used as a reference to get further treatment from a doctor. 

Early diagnosis of lung conditions will provide optimal results if the appropriate parameters are 
used. There are several vital parameters that can be used to determine the lung conditions, including body 
temperature, oxygen saturation, fingernail color and lung volume [10], [11]. People with lung diseases 
generally have a high body temperature. This is caused by an infection that occurs in the lungs, so the 
hypothalamus in the brain sends signals to the skin, muscles and organs to increase body temperature in 
response [12]. In addition, people with lung disease also have a low percentage of oxygen saturation. Oxygen 
saturation is the ratio between the hemoglobin that binds oxygen (oxyHb) to the total hemoglobin in the 
blood. The low percentage of oxygen saturation is caused by the inability of the lungs to work properly to 
meet the oxygen demand in the blood [13]. Lack of oxygen in the blood can also be detected by discoloration 
of the fingernails. In healthy people, the fingernails have a pink color [14]. Meanwhile, people with lung 
disease will experience a condition called cyanosis. Cyanosis is a condition when the fingernails become pale 
and bluish due to a lack of oxygen in the blood. Furthermore, lung volume of people with lung disease is 
decreased. This condition is caused by a decrease in elasticity in the lung muscles as a result of the presence 
of viruses and bacteria in the lungs [15]. 

Research on the early diagnosis of lung conditions has been performed by Liebenlito [16]. In this 
research, the diagnosis of lung conditions used data in the form of chest x-ray images. Karnkawinpong [17] 
also uses chest x-ray images in classifying pulmonary tuberculosis lesion. The classification system designed 
in both research is based on computer-aided diagnosis (CAD). The use of chest x-ray images and computer 
becomes less practical when applied to an early detection system. Patients are required to have an x-ray 
examination in advance at a health care facility and it takes time. Therefore, in this research we propose an 
embedded system that can provide real-time classification results for lung conditions. The diagnosis of lung 
conditions in this system uses parameters of body temperature, oxygen saturation, fingernail color and lung 
volume. These four parameters can be easily acquired from the body non-invasively. The parameters used in 
this research were detected using various sensors. Body temperature was measured using the MLX90614 
temperature sensor. Then the oxygen saturation was measured using the MAX30100 pulse oximetry sensor. 
Fingernail color detection uses the TCS3200 color sensor. Meanwhile, the lung volume was measured using 
a flex sensor. 

This research uses the k-nearest neighbor (KNN) method in the classification process. The KNN 
method was chosen because it has resistance to noisy data. This method has been used by Qin [18] to classify 
chronic kidney disease, with an accuracy of 99.25%. In another research conducted by Wang [19], the KNN 
method had an accuracy of 99.67% in diagnosing epilepsy using electroencephalogram (EEG) signals. KNN 
was also used by Shaharum [20] in his research for classifying the severity of asthma based on the wheezing 
sound with the accuracy of 97.5%. In this research, we implemented the KNN method to classify healthy 
lung, pneumonia and tuberculosis based on parameters of body temperature, oxygen saturation, fingernail 
color and lung volume. 


2. PROPOSED METHOD 

The purpose of this research was to design and implement a system for early detection of lung 
conditions and classify them into healthy lung, pneumonia and tuberculosis. A diagnosis is given based on 
measurements from the temperature sensor, pulse oximetry sensor, color sensor and flex sensor. Where the 
acquisition of each sensor is carried out non-invasively without the help of medical personnel. The 
classification was processed using the KNN method on the Arduino Mega, then the results are displayed on 
the liquid crystal display. The KNN method requires storing all training data on the microcontroller. 
Therefore, a microcontroller with a large memory was used. Arduino Mega has a flash memory capacity of 
256 KB, SRAM of 8 KB and EEPROM of 4 KB. This spesification is sufficient for processing the data that 
will be used in this research. 





Classification of lung condition for early diagnosis of pneumonia and tuberculosis based... (Rizal Maulana) 


1264 O ISSN: 2302-9285 


2.1. Body temperature measurement 

MLX90614 temperature sensor was used to measure body temperature in the designed system. This 
sensor was used by Gu [21] in his research to design a wearable device for monitoring the health condition of 
the elderly. The MLX90614 sensor is a contactless temperature sensor. This sensor works by utilizing 
radiation energy emitted by an object when it generates heat. The radiation energy is directly proportional to 
the heat produced. In this sensor there is a part called thermopile, which functions to convert energy radiation 
into a voltage. The MLX90614 sensor has an accuracy and resolution of 0.5°C and 0.02°C, respectively. 

As shown in Figure 1, the main components in this system were assembled on a black box. The 
temperature and flex sensor were placed on the outside of the black box to simplify the measurement process. 
While the microcontroller, pulse oximetry and color sensor were placed inside the black box. There were two 
additional components used, that is pushbutton and LCD. The pushbutton was used as a trigger to start 
measurement for each sensor, while the LCD was used to display the measurement and classification results. 
Body temperature measurement was made by pointing the MLX90614 sensor at the object’s forehead. In 
order to obtain accurate measurement results, the distance between the MLX90614 sensor and the object’s 
forehead must be less than 5cm. 


Color Sensor v, J Pulse Oximetry 
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Push Button tco seni 
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Fingertips 
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Figure 1. Sensor placement design 


2.2. Oxygen saturation measurement 

In this research, the MAX30100 pulse oximetry sensor was used to measure oxygen saturation. This 
sensor has been used by Xuedan [22] to detect oxygen saturation in several parts of the body. Oxygen 
saturation is the percentage of the amount of hemoglobin with oxygen (oxyHb) to the total hemoglobin in the 
blood. The MAX30100 module sensor has two LEDs and a photodetector. One LED emits red light and the 
other one emits infrared light with a wavelength of 650nm and 950nm, respectively. Both light sources were 
emitted in the part of the body where oxygen saturation measured. Some of the light was absorbed by 
hemoglobin and some was reflected. OxyHb absorbs more infrared light than red light, while hemoglobin 
without oxygen (deoxyHb) absorbs more red light than infrared light. The reflected light was received by the 
photodetector. The comparison between red light and infrared light reflection was used to calculate the 
percentage of oxygen saturation. As shown in Figure 1, the MAX30100 sensor was placed in the hole in the 
black box. The hole was used to put the fingertip whose oxygen saturation will be measured. The 
MAX30100 sensor was mounted at the bottom of the hole, so it can be in direct contact with the finger. The 
black color used in the box aims to reduce ambient light interference which can affect sensor accuracy. 


2.3. Fingernail color detection 

The TCS3200 color sensor was used to detect fingernail color in the designed system. This sensor 
was used by Ragul [23] to diagnose health parameters using urine color. The TCS3200 sensor is a sensor 
module consist of 4 LEDs and 64 photodiode arrays. The 64 photodiode arrays used are composed of 16 red 
filter photodiodes, 16 green filter photodiodes, 16 blue filter photodiodes and 16 non-filtered photodiodes. 
The LED has a function to emit light on objects, some light is absorbed by the object and some light is 
reflected onto the photodiode array. Each color filter on the photodiode array is activated consecutively to 
detect red, green and blue colors of the objects. The activation of each filter can be done by adjusting the 
selector pin logic on the sensor. The photodiode generates a current that is proportional to the intensity of the 
light received. The current is then converted into frequency. To get the RGB color value of the object, a 
frequency counter program is needed on the microcontroller [24]. As shown in Figure 1, the TCS3200 sensor 
was mounted at the top of the hole in the black box. With such placement, the TCS3200 sensor can directly 
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detect the fingernail color of the finger that is inserted into the hole. The results of this detection provide 
RGB value of the fingernail color. 


2.4. Lung volume measurement 

Measurement of lung volume in this research was performed using a flex sensor. This sensor has 
been used by Kristiani [25] to measure the respiratory rate, with an accuracy of 97.7%. The flex sensor is a 
variable resistor whose resistance value changes when the surface bends. A signal conditioner circuit is 
required to process the resistance change in the flex sensor. The first step of signal conditioning is to convert 
the resistance of the flex sensor to a voltage. This conversion process can be done using a Wheatstone bridge 
circuit. The Wheatstone bridge circuit produces a small voltage value, generally in the order of millivolts. 
Therefore, a second step of signal conditioning using a differential amplifier circuit is needed to amplify the 
voltage output from Wheatstone bridge circuit. 

As shown in Figure 1, the flex sensor was attached to a flexible belt that is outside the black box. 
The flexible belt was tied around the chest with the sensor on the front. The wearing of a flexible belt on the 
chest is shown in Figure 2. Lung volume measured in this research was tidal volume. Tidal volume is the 
volume of air that moves in or out in the normal inhalation and exhalation. During inhalation, the chest 
expands and causes the flex sensor to bend. This condition makes the output voltage of the signal conditioner 
circuit increase, directly proportional to the increase in the resistance of the flex sensor. Otherwise, the chest 
deflates and the flex sensor returns to its initial condition during exhalation. This causes the output voltage of 
the signal conditioner circuit decrease. Lung volume was measured by calculating the difference in the output 
voltage between inhalation and exhalation. The lung volume processed in the system is a digital value of the 
output voltage that has been converted using the ADC on the microcontroller. 


"> Flex Sensor 
Flexible = i 


Belt 





Figure 2. Flexible belt attachment at the chest 


2.5. Data acquisition process 

The data acquisition process from each sensor was triggered by the pushbutton used in the system. 
Pressing the pushbutton for the first time, triggers the measurement of body temperature by the MLX90614 
sensors. The results of this measurement provide a feature of body temperature in Celcius (°C). Furthermore, 
pressing the pushbutton for the second time was used to trigger the oxygen saturation measurement by the 
MAX30100 sensor. From this measurement, the oxygen saturation feature was obtained in percent (%). The 
third trigger by a pushbutton has a function to acquire fingernail color by the TCS3200 sensor. In this 
measurement, fingernail color features were obtained in the form of R,G and B values. The last trigger was 
used to get the value of lung volume features from the flex sensor. After all feature values were obtained, 
these values were used in the classification process using the KNN method. Where this process was carried 
out on the Arduino Mega. The classification results are then displayed on a 16x2 LCD. The illustration of the 
data acquisition process is shown in Figure 3. 


2.6. Dataset 

The dataset used in this research was obtained from hospital in Malang and Pasuruan, Indonesia. 
The dataset is primary data obtained by measuring parameter values directly to patients with lung disease. 
Each data has 6 features, that is body temperature, oxygen saturation, the RGB value of fingernail color and 
lung volume. The dataset used already has a label in the form of healthy lung, pneumonia or tuberculosis, 
which is given based on the doctor’s diagnosis. The dataset consists of 51 data with details of 24 healthy lung 
data, 12 pneumonia data and 15 tuberculosis data. Our dataset collection was limited due to the Covid-19 
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outbreak. Moreover, the data needed is primary data that requires direct contact to the patients with lung 
disease. So we can only collect 51 data obtained before the pandemic. In addition, small datasets can be 
overcome by adding data using several methods, one of which is by using the mean and standard deviation of 
the data obtained. 
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Figure 3. Data acquisition process 





2.7. Classification of lung condition using k-nearest neighbor 

KNN is a classification method that uses a supervised learning algorithm which requires labeled 
training data in each class. The labeling is used as a learning basis for future data processing. The KNN 
method performs a classification process based on the majority of classes that appear in the k number of 
nearest neighbor, where k is the number of nearest neighbor used. The neighbor is a training data that has the 
nearest distance to the test data. The most commonly used technique of determining distances between 
training data and test data is the euclidean distance [26]. In general, the steps of the KNN method are as 
follows: 
a. Determine the value of k 
b. Compute the distance between the test data and each training data using the Euclidean distance equation 

as shown in (1) 


d = (SEG — yD? (1) 


where d is the Euclidean distance, x is the training data, y is the test data, i is the n-th feature and n is the 
number of features. 
c. Sort the euclidean distance in ascending order 
d. Determine the majority of the class that appears in the k nearest neighbors as a result of the classification 
The classification process of lung conditions using KNN is shown in the flowchart in Figure 4. The 
system input is the feature value obtained from the acquisition of each sensor. There is a body temperature 
feature from the acquisition of the MLX90614 sensor, an oxygen saturation feature from the MAX30100 
sensor, an RGB feature from the TCS3200 sensor and a lung volume feature from a flex sensor. As 
mentioned before, KNN is a supervised learning method. Therefore, training data are required as system 
input. In addition, input in the form of k values is also needed to determine the number of nearest neighbors 
used. The first process performed by the system is normalizing all feature data using the min-max method 
based on the formula in (2). 


X—-Xmin 
Xnorm See a a (2) 


Xmax—Xmin 


where Xnorm is normalized value, x is raw value, Xmin and Xmax are Minimum and maximum value of the raw 
data, respectively. This normalization is required to balance the data range of all features used in the 
Euclidean distance calculation. Without normalization, features with a narrow data range value does not have 
a significant effect if added with features with a wide data range. The normalization results make all features 
have data values between the range 0 and 1. After the normalization process, the Euclidean distance between 
the test data and each training data was calculated. The results of the Euclidean distance are then sorted in 
ascending order. The final step of the KNN process is determining the classification results based on the 
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majority of classes that appear in the k number of nearest neighbors. There are three classes designed in this 
research, that is healthy lung, pneumonia and tuberculosis class. 
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Figure 4. Flowchart of the system classification process 























3. RESULTS AND DISCUSSION 

The test aims to determine the accuracy of the system in classifying healthy lung, pneumonia and 
tuberculosis condition. In addition, the measurement of computation time was also carried out which aims to 
determine how fast the system can provide classification results. Data acquisition was carried out in a 
hospital, accompanied by a doctor as an expert. The first step, the doctor examines the patients and provides 
a diagnosis of the patient’s lung condition based on the medical record data. The next step, data acquisition of 
body temperature, oxygen saturation, fingernail color and lung volume was carried out using the system we 
have designed. These data were used as input for the KNN classification method. There are 51 datasets which 
are then divided into training data and test data with a 2:1 ratio. Thus, 34 data were used as training data and 
17 data were used as test data. The accuracy of the system was obtained by comparing the system 
classification results with the doctor’s diagnosis. The test was conducted using three different k values, that is 
k=3, k=5 and k=7. This aims to determine the optimal k value in the classification process. The results of the 
system accuracy test are shown in Table 1. 

As shown in Table 1, there are six features used in the classification process using the KNN method 
with three different k values. These features are body temperature (Temp), oxygen saturation (SpO2), 3 
features of fingernail color (R,G,B) and lung volume (Flex). Doctor’s diagnosis in Table 1 was determined 
based on the patient’s medical record, not based on these six features. From the test results using k=3, there 
are 4 data whose classification results are different from the doctor's diagnosis, which is used as a reference. 
Thus, an accuracy value of 76.47% was obtained using k=3. The best accuracy of the system was obtained 
using the values of k=5 and k=7. In this test, only 2 data were classified incorrectly. Then the system 
accuracy in classifying the lung condition at both k values was 88.24%. 

The second test is the measurement of system computation time. This test was carried out using two 
k values that have the highest accuracy in the previous test, that is k=5 and k=7. The computation time 
measured in this test is the time required for the KNN method to perform the classification process in each 
test data. The computation time measurement starts when the values of all parameters have been collected 
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and ends when the classification results were obtained. The results of the average computation time at each k 
value are shown in Table 2. 

From the computation test result, the average computation time for k=5 was 912ms. While the 
computation time for k=7 was 1031ms. The use of a small k value can reduce the system computation time. 
This happens because a small k value can reduce the number of voting inputs in the KNN method. From this 
test it can be concluded that the system designed is able to provide real-time system classification results. 
Research on the early diagnosis of lung condition has been performed by the authors, Maulana [14]. In 
previous research, the diagnosis of lung condition only uses parameters of body temperature and fingernail 
color. Both parameter values were processed using the Naive Bayes classification method and the resulting 
system accuracy was 84.21%. Oxygen saturation and lung volume features were added in this latest research. 
In addition, this research uses the KNN method in the classification process. Although the computation time 
was 78ms slower than the previous research, this research was able to provide better accuracy in classifying 
lung condition. 


Table 1. KNN classification results using 3 different k values 











No _Temp(°C) _SpO2 (%) R G B Flex Doctor’s Diagnosis k=3 k=5 k=7 

1 36,7 96 203 156 151 44 Healthy Healthy Healthy Healthy 

2 36,4 95 203 164 167 42 Healthy Healthy Healthy Healthy 

3 36,4 97 204 154 153 35 Healthy Healthy Healthy Healthy 

4 36,4 96 205 169 171 58 Healthy Healthy Healthy Healthy 

5 36,4 95 208 178 176 42 Healthy | Tuberculosis Tuberculosis Tuberculosis | 

6 36,8 96 210 170 165 58 Healthy Health Healthy Healthy 

7 36,5 97 213 183 188 65 Healthy eR Healthy Healthy 

8 36,4 98 213 184 182 35 Healthy Healthy Healthy Healthy 

9 37,5 92 218 195 197 36 Pneumonia Pneumonia Pneumonia Pneumonia 
10 3755 93 222 207 212 37 Pneumonia Pneumonia Pneumonia Pneumonia 

11 37,5 92 223 200 194 29 Pneumonia Pneumonia Pneumonia Pneumonia 
12 37,4 92 223 202 202 36 Pneumonia Pneumonia Pneumonia Pneumonia 
13 37,2 95 214 197 200 46 Tuberculosis [BREGMOMAPeimona Pneumol 
14 37,3 93 206 183 182 38 Tuberculosis Tuberculosis Tuberculosis Tuberculosis 
15 37,4 93 215 191 180 32 Tuberculosis Tuberculosis Tuberculosis Tuberculosis 
16 37,1 93 216 182 182 36 Tuberculosis Tuberculosis Tuberculosis Tuberculosis 
17 37,2 94 223 191 185 38 Tuberculosis D Tuberculosis Tuberculosis 








Table 2. Average computation time at each k value 








k Computation Time (ms) 
5 912 
7 1031 





4. CONCLUSION 

The detection of the lung condition is very important, because the lungs have a vital function in the 
human body. This research proposed a system that is able to detect and classify the lung conditions in real- 
time. The classification in this system is determined based on body temperature, oxygen saturation, fingernail 
color and lung volume. These four parameters have a high correlation with lung conditions. This research 
used the KNN method to classify healthy lung, pneumonia and tuberculosis condition. 51 datasets obtained 
from the hospital were used in this research. Each data has a class label based on the doctor’s diagnosis. 
From the test results using training data and test data with a ratio of 2:1, this system has an accuracy of 
88.24% in classifying lung conditions with a computation time of 912ms using k=5. In order to provide better 
accuracy results, further improvements regarding the number of datasets used should be made. In addition, 
there needs to be an improvement in the lung volume measurement system. In this research, the measurement 
of lung volume was limited to the measurement of tidal volume. 
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